VISIVE.AI

Google DeepMind Launches On-Device VLA Model for Robotic Devices

Google DeepMind introduces a vision language action (VLA) model that runs locally on robotic devices, enhancing dexterity and task adaptation without network connectivity.

Jun 24, 2025Source: Visive.ai
Google DeepMind Launches On-Device VLA Model for Robotic Devices

Google DeepMind has unveiled a groundbreaking vision language action (VLA) model designed to run locally on robotic devices. This new Gemini Robotics On-Device model offers general-purpose dexterity and fast task adaptation, marking a significant step forward in the field of robotics.

Key Features of Gemini Robotics On-Device

The Gemini Robotics On-Device model is designed to operate independently of any data network. According to Carolina Parada, Senior Director and Head of Robotics at Google DeepMind, this capability is crucial for latency-sensitive applications and ensures robustness in environments with intermittent or zero connectivity.

General-Purpose Dexterity

Building on the task generalization and dexterity capabilities of the original Gemini Robotics, the new model is specifically tailored for bi-arm robots. It excels in rapid experimentation with dexterous manipulation and adaptability to new tasks through fine-tuning.

Task Adaptation

The model can perform a wide range of tasks, including unzipping bags, folding clothes, zipping a lunchbox, drawing a card, pouring salad dressing, and assembling products. It is also the first VLA model from Google DeepMind available for fine-tuning, allowing developers to enhance performance with as few as 50 to 100 demonstrations.

Advantages and Applications

Parada emphasizes that while many tasks will work out of the box, developers can further adapt the model to achieve better performance for specific applications. The model's ability to quickly adapt to new tasks demonstrates its foundational knowledge and potential for widespread use.

Latency and Connectivity

Operating independently of a data network, the Gemini Robotics On-Device model is ideal for applications where latency is a critical concern. It ensures robust performance in environments with limited or no connectivity, making it suitable for a variety of real-world scenarios.

Market Context and Future Potential

The launch of Gemini Robotics On-Device aligns with the growing trend of integrating AI and natural language processing into robotics. This trend is particularly prominent in Silicon Valley, where large language models are giving robots the capability to understand and execute complex tasks.

Multimodal Capabilities

Google DeepMind's decision to make Gemini multimodal—capable of handling text, images, and audio—reflects a strategic move toward better reasoning and a broader range of applications. This multimodal approach could pave the way for new consumer products and innovations in the tech industry.

Competitive Landscape

Several other companies are also developing AI-powered robots capable of general tasks, contributing to a competitive and rapidly evolving market. Google DeepMind's advancements in Gemini Robotics highlight the company's commitment to pushing the boundaries of what robots can achieve.

Clear Takeaway

The introduction of the Gemini Robotics On-Device model by Google DeepMind represents a significant leap in the capabilities of local robotic devices. With its focus on general-purpose dexterity and task adaptation, this model has the potential to transform various industries and applications, from manufacturing to consumer products.

Frequently Asked Questions

What is the Gemini Robotics On-Device model?

The Gemini Robotics On-Device model is a vision language action (VLA) model developed by Google DeepMind that runs locally on robotic devices, enhancing dexterity and task adaptation without network connectivity.

How does the model operate without a data network?

The model is designed to operate independently of any data network, making it suitable for latency-sensitive applications and environments with limited or no connectivity.

What tasks can the Gemini Robotics On-Device model perform?

The model can perform a variety of tasks, including unzipping bags, folding clothes, zipping a lunchbox, drawing a card, pouring salad dressing, and assembling products.

What is the significance of the model being available for fine-tuning?

Being available for fine-tuning allows developers to enhance the model's performance for specific applications with as few as 50 to 100 demonstrations, demonstrating its adaptability and foundational knowledge.

How does this model fit into the broader landscape of AI and robotics?

The Gemini Robotics On-Device model aligns with the growing trend of integrating AI and natural language processing into robotics, contributing to a competitive and rapidly evolving market.

Related News Articles

Image for AI's Real Impact: Beyond the Hype

AI's Real Impact: Beyond the Hype

Read Article →
Image for AI's Role in Preventing Air India Tragedy

AI's Role in Preventing Air India Tragedy

Read Article →
Image for AI-Powered Metaheuristic Optimizer Enhances Economic Dispatch

AI-Powered Metaheuristic Optimizer Enhances Economic Dispatch

Read Article →
Image for Preparing the American Workforce for the AI Revolution

Preparing the American Workforce for the AI Revolution

Read Article →
Image for US Student Visa Rejection: AI Graduate Explores Career Alternatives

US Student Visa Rejection: AI Graduate Explores Career Alternatives

Read Article →
Image for State Grid Xinjiang Deploys Advanced AI for Knowledge Management

State Grid Xinjiang Deploys Advanced AI for Knowledge Management

Read Article →